Lecture 8 Sample, Population, Random Variable and Probability

Concepts:

      Sample and population, description of a population by random variables, probability, calculation of probability for random variables, simple random sample, LLN and CLT, probability distribution of sample average

Statistical way of study is studying the population through collected sample data.

A key question in sampling is: How can we ensure that a collected sample is representative of a population when information about the population is yet unknown?

Being representative means that, for a population with an unknown proportion of red figures (see the following figure), if, for example, \(x\) out of seven figures are red, the same ratio should be present in the sample.

If \(x\) out of 7 figures are red and every figure in the population has an equal chance of being chosen, then a red figure can be expected to have a chance of \(x/7\) of being chosen. For a sample of size 70, we can expect roughly \(x \times 10\) figures to be red in the sample due to the equal chance principle. Therefore, equal chance in selection will preserve the proportion of red figures in the sample. This is the principle underlying a simple random sample.

8.1 Principle of simple random sample

  • Each element has the same chance to be chosen.

  • The chance for an element to be chosen is independent from each other.

However, we also know that among the 70 data points, while the expected number of red figures is 10, the actual number can vary. It might be 11, or 9 for example. This is known as sample uncertainty. Similarly, if mean population Salary, i.e. the mean salary of all 300 thousand employees of the company is $52.5K, the sample average of the 300 employees in our data can be slightly different than 52.5K.

Sample Uncertainty

In the following interactive diagram you can investigate the sample uncertainty, chages in the sample mean and sample standard deviation, by change sample size number of samples and the population parameters.

30
10

Handling sample uncertainty is one of the main tasks in statistics.

8.2 Description of a Population by random variables

How can we describe a population variable, such as the SALARY of all employees in a company?

Similar to a statistical variable, SALARY can take on many different values, with some values appearing more frequently than others. If we could collect data on the entire population, we could count the frequencies of these values to describe the population. However, since collecting data on the entire population is impractical, the question becomes: How can we describe the population variable, SALARY, without counting the frequencies?

The following graphs display the distribution of salaries with increasing sample sizes of 300, 2,000, 10,000, and 23,318.

As we increase the amount of data, the histogram increasingly resembles a smooth curve. This limiting curve is known as the probability density function (PDF).

Recall that the area of the bars in a histogram represents the relative frequencies of the data over corresponding intervals. If we know the mathematical formula of the curve, we can calculate the areas under the curve for any given interval, which represents the relative frequency for that interval. Thus, we can obtain the frequencies of the population variable by specifying the curve, rather than counting individual data points.

The task of describing a population variable is to determine this curve, known as the density function.

Gauss (1777-1855) solved the problem with the following function:

\[f(x) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}} \] This function is called Gaussian density function. \(\mu\) is the centre of the density function and the \(\sigma\) is the standard deviation of the density function. When \(\mu=0\) and \(\sigma=1\) the normal density function takes the following form \[f(x) = \frac{1}{\sqrt{2\pi}}e^{-\frac{x^2}{2}} \]

This curve is called the standard normal density function. The standard normal distribution is of particular interest because probabilities for any normal random variable can be transformed into probabilities for a standard normal random variable.

As described earlier, the density function can be seen as the limit of the histogram as the number of data points increases indefinitely. However, collecting such a large amount of data is often impractical. So, how can we interpret the density function in practical terms?

If we randomly select one employee, what can we infer about his SALARY? Based on the histogram, we know that a SALARY value is much more likely to fall within the interval (52K,53K) than in the interval (26K,27K), because the frequency of Salaries in (52K,53K) is much higher than that over the interval (26K,27K). Thus, the area under the density function over an interval gives the probability that the value of SALARY will fall within that interval.

The histogram of a statistical variable has the following two key properties:

  1. The total area under the histogram is equal to one.
  2. The area of the histogram over a given interval represents the relative frequency of the variable taking values within that interval.

Compared to the histogram of relative frequencies, the density function has similar properties:

  1. The total area under the density function is equal to one.
  2. The area under the density function over a given interval represents the probability that the SALARY of a randomly chosen employee falls within that interval.

The SALARY is considered a random variable, and the density function provides the probability that this random variable takes on values within any given interval.

Random variable vs. statistical variable

Image result for lotto draw Image result for sample and population

What is the difference between a random variable and statistical variable?

The *Lotto number before the draw is a random number, the recorded winning numbers are values of the statistical variable.

Before the draw we do not know what, the winning number will be. But we know that the probability a particular number to be the winning number is very low.

The number on each ball in a Lotto is not random, it is the randomly drawing process makes the outcome a random variable. Likewise, the salary of each employee is not random, but if we select the employee “randomly” from the population, SALARY becomes a random variable.

8.3 Discrete random variables

If a random variable assumes only discrete values, it is called discrete random variable (discrete RV). For example, rolling a die can be described using a discrete random variable. We describe a discrete random variable by (1) listing all possible values and (2) specifying the corresponding probabilities. These details are typically organized in a table called probability distribution table.

(This is similar to how relative frequency tables are used for categorical variables.)

X \[x_{1}\] \[x_{2}\] \[x_{3}\] …… ,\(x_{N}\)
P \[p_{1}\] \[p_{2}\] \[p_{3}\] …… ,\(p_{N}\)

Here the random variable X can assume one of the N possible values \(x_{1},\ x_{2},x_{3},\ \ldots\ldots x_{N}\) each with probability \(p_{1},\ p_{2},p_{3},\ \ldots\ldots..p_{N}\), respectively.

Random variables can be used to describe random events, such as rolling a die. Each possible outcome of a random event is called a sample point. The set of all sample points is called the sample space, denoted by: S = {\(x_{1},\ x_{2},x_{3},\ \ldots\ldots x_{N}\)}. Each sample point \(x_{i}\) has a chance of occurring, measured by \(p_{i}\), which is called the probability of \(x_{i}\). Probability is a measure of the likelihood of each sample point occuring.

Formal requirement of probability:

  1. It must be scaled between 0 and 1.

\(0 \leq P({X = x}_{i}) \leq 1\)

  1. For mutually exclusive events \({X = x}_{i}\) and \({X = x}_{j}\)

\(P\left( \{X = x_i\} \bigcup \{X = x_j\} \right) = P\left( X = x_i \right) + P( X = x_j)\)

  1. The sum of the probabilities for all outcomes must equal 1.

\(\sum_{i = 1}^{N}{P({X = x}_{i})} = 1\)

There are three ways to assign probability:

  1. Classic Method: equal chance principle

  2. Frequency Method: based on frequency tables of data.

  3. Subjective Method: expert’s opinions.

Example 1

Example 1 (rolling a dice)

Let X be the number facing up of a dice. As value of X is the result a random event X is a random variable. It can be described by following probability table.

X 1 2 3 4 5 6
P 1/6 1/6 1/6 1/6 1/6 1/6

Following the classic method, because each face has the same chance each outcomes has the equal probability. According to the formal requirement C, all 6 probabilities add to one. We have

\[P\left( X = x_i \right) = \frac{1}{6}\hspace{2cm} \mbox{for $x_i$ = 1,2,...,6}\] According to requirement B we have:

\[P(X > 3)= P\Big( \{X=4\} \bigcup \{X=5\} \bigcup \{X=6\}\Big) = P(X=4)+P(X=5)+P(X=6) = \frac{1}{2}\]

\[P(X < 2) = P(X=1) = \frac{1}{6}\]

Example 2 (Lotto number)

How big is the probability to win a Lotto? As in Lotto 7 numbers are drawn from 45 numbers, the number of total possible outcomes is:

\[45 \times 44 \times 43 \times 42 \times 41 \times 40 \times 39 \cong 228,713,000,000\]

The probability to win with any 7 numbers is very low, as low as \(1/228,713,000,000\).

Example 3 (Binomial distribution)

We want to know the probability of the number of successes in n independent trials each of which has a success probability of p. The number of successes is a random variable, and it can assume all integer values from 0 to n. We call this RV binomial distributed random variable and denote it by:

\(X\ \sim B(n,p)\)

The probability can be calculated using the following formula

\(P(X = x) = \ \frac{n!}{x!(n - x)!}p^{x}{(1 - p)}^{n - x}\)

You can calculate the probability using the formula or using the Table of Binomial Probabilities.

We can calculate the Binomial distribution and produce the bar chart using Excel for \(X\ \sim B(10,0.8)\)

8.4. Relation between random variables

A survey of 120 fund managers, each person was asked his or her education background fund managing performance. The results are summarised in the following table.

  Outperform the market Not Outperform the Market Total
Top 10 MBA 20 10 30
Not Top 10 MBA 20 70 90
Total 40 80 120

Table 1

If one of the persons is randomly selected, the outcome can be described by two random variables: E for education background with E = 0 for top 10 MBA, and E = 1 for not top 10 MBA, and F for performance with F = 0 for outperform the market, and F = 1 for not outperform the market. Since the selection is random, E and F are two random variables. The joint probability of E and F is defined by the following probability table:

  F=0 F=1 Marginal probability
Top 10 MBA 1/6 1/12 1/4
Not Top 10 MBA 1/6 7/12 3/4
Marginal probability 1/3 2/3 1

8.4.1 Joint probability

The probability that two random variables assume specific values simultaneously is called the joint probability. The joint probabilities specified in the table above can be expressed as follows:

\[P\Big(\{E=0\}\bigcup \{F=0\}\Big)=\frac{1}{6}\] \[P\Big(\{E=0\}\bigcup \{F=1\}\Big)=\frac{1}{12}\] \[P\Big(\{E=1\}\bigcup \{F=0\}\Big)=\frac{1}{6}\] \[P\Big(\{E=1\}\bigcup \{F=1\}\Big)=\frac{7}{12}\]

8.4.2 Marginal Probability

In the survey example above, if one person is randomly selected, the probability that they outperform the market is 1/3, since40 out of 120 people outperformed the market. This is denoted by \(P(F = 0) = 1/3\). This probability is called marginal probability as it is given by the column sum (or row sum) of the joint probability table.

\[P(F=0) = P\Big(\{F=0\} \bigcup \{E=0\}\Big) + P\Big(\{F=0\}\bigcup \{E=1\}\Big) = \frac{1}{6}+\frac{1}{6}=\frac{1}{3}\]

8.4.3 Conditional Probability

The probability of an event can change when additional related information about the event is known. For example, in a multiple-choice question with 4 options, the probability of selecting the correct answer is initially 1/4. However, if it is known that option 1 is incorrect, the probability of choosing the correct answer increases to 1/3, as option 1 is eliminated.

Similarly, if it is known that a randomly selected person has a top 10 education background, the probability of that person outperforming the market changes to 2/3. This is because, among 30 people with a top 10 MBA, 20 outperformed the market. We denote this conditional probability as \[P(F=0|E=0)=\frac{2}{3}\] Conditional probability can be calculated from the joint probability and the marginal probability: \[P(F=0|E=0) = \frac{P\Big(\{F=0\} \bigcup \{E=0\}\Big)}{P(E=0)}= \frac{1/6}{1/4} = \frac{2}{3}\] The above formula is sometime also written as: \[P(F|E)=\frac{P(F\bigcup E)}{P(E)}\] ### 8.4.4 Independence

In the survey example mentioned earlier, information about a fund manager’s educational background is useful in predicting their performance. This is evident because there are differences in performance between fund managers with top 10 MBA degrees and those without. When such differences exist, we say the two random variables E (educational background) and F (performance) are dependent. Conversely, if performance were the same regardless of educational background, knowledge of education would not enhance performance prediction. In such a scenario, we would say E and F are independent.

Two random variables are independent if the conditional probability distribution equals to the marginal probability distribution.

\[P(F|E) = P(F)\hspace{0.5cm}\Leftrightarrow \hspace{0.5cm}\mbox{F and E are independent.}\] Inserting the definition above into the previous equation we have for independent E and F:

\[P\Big(F\bigcup E\Big) = P(F)P(E)\] The following table specifies independent E and F. We observe that the joint probability cab be calculated by multiplying the corresponding marginal probabilities using the formula above and the odds of outperform the market are the same in all groups.

  F=0 F=1 Marginal probability
Top 10 MBA 1/12 2/12 1/4
Not Top 10 MBA 1/4 1/2 3/4
Marginal probability 1/3 2/3 1

8.5. Continuous random variables

Recalling the example of SALARY in Section 8.3, we describe a continuous random variable (RV) through (1) possible values it may assume, and (2) its density function. The density function is the counterpart of histogram, Specifically, the area over an interval under the density function presenting the probability for the RV to assume values within this interval.

8.5.1. Uniformly distributed random variables

Example 4 (uniform distribution)

\(X\sim\text{unif}(a,b)\) is a continuous random variable that is uniformly distributed over the interval [a,b]. This random variable is often used to present our “No Knowledge” on the possible outcome within an interval. The wider the interval the less knowledge we have on the outcome. The narrow the interval, the more precise we know about the possible outcomes.

The density function is \(f(x) = \frac{1}{b - a}\ \ \ \ for\ \ \ \ \ \ a \leq x \leq b,\ \ \ \ \ and\ \ \ f(x) = 0,\ \ \ otherwise\).

What is the probability for the RV X to assume the value within the interval of [a, (a+b)/2]?

\[P\left( a \leq X \leq \frac{a + b}{2} \right) = \frac{1}{b - a}\left( \frac{a + b}{2} - a \right) = \frac{1}{2}\]

The calculation above is based on the fact that the probability is the area under the density function. \(\frac{1}{b - a}\) is the height, and \((\frac{a + b}{2} - a)\) is the width.

Depending on the density function, this integral is often a very difficult task. The uniform distribution is really an exceptionally simple case.

8.5.2. Normally distributed random variables

Formally for a continuous random variable X with density function f(x) the probability for X to assume value within the interval of (a,b) can be calculated: \[P(a \le X \le b) = \int_a^bf(x)dx\]

For normal random variable with \(X\sim N(\mu,\sigma^2)\), \[\small P(a \le X \le b) = \int_a^b\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}-\int_{-\infty}^b\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}-\int_{-\infty}^a\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}=P(X\le b)-P(X\le a) \] By the method of change variables, we make the substitute \(z=\frac{x-\mu}{\sigma}\), it follows

\[\small \int_a^b\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}=\int_{-\infty}^{\frac{b-\mu}{\sigma}}\frac{1}{\sqrt{2\pi}}e^{-\frac{z^2}{2}}dz-\int_{-\infty}^{\frac{a-\mu}{\sigma}}\frac{1}{\sqrt{2\pi}}e^{-\frac{z^2}{2}}dz=P\Big(Z\le\frac{b-\mu}{\sigma}\Big)-P\Big(Z\le\frac{a-\mu}{\sigma}\Big) \] The equation above tells us probability calculation for any normal random variables is reduced to the probability calculation of a standard normal random variable. The later is however still a difficult task. In this unit we do not actually calculate the probability but find the probability which has been calculated and tabulated by others in probability tables, i.e., we look up the probability in probability tables or use EXCEL.

Conventionally a standard normal random variable is denoted by Z. We have \(Z \sim N(0,1)\).

Calculation of probability for standard normal random variable using standard normal table:

\(P(Z \leq 0.0)\)

\(P(Z \leq 1)\)

\(P(Z \leq 1.12)\)

\[P(X \leq 0.21)\]

\[P(Z \leq 1.72)\]

For \(X\ \sim N(52,\ 9.1)\)

\(P(X \leq 30)\) : norm.dist(30,52,9.1,TRUE)

\(P(Z \leq 40)\) : norm.dist(40,52,9.1,TRUE)

\(P(Z \leq 50)\) : norm.dist(50,52,9.1,TRUE)

\(P(40 < X \leq 60)\): norm.dist(60,51,9.1,TRUE)- norm.dist(40,51,9.1,TRUE)

\(P(X > 60)\) : 1- norm.dist(60,52,9.1,TRUE)

Using the standard normal table for non-standard normal random variables: \(X\ \sim N(52,\ 9.1)\)

\[P(X < 30) = P(X - 52 < 30 - 52) = P\left( \frac{X - 52}{9.1} < \frac{30 - 52}{9.1} \right) = P(Z < - 2.417) = 0.0078\]

\[P(X < 40) = P(X - 52 < 40 - 52) = P\left( \frac{X - 52}{9.1} < \frac{40 - 52}{9.1} \right) = P(Z < - 1.318) = 0.0934\]

\[P(X < 50) = P(X - 52 < 50 - 52) = P\left( \frac{X - 52}{9.1} < \frac{50 - 52}{9.1} \right) = P(Z < - 0.219) = 0.4129\]

\(P(X > 60) = 1 - P(X - 52 < 60 - 52) = 1 - P\left( \frac{X - 52}{9.1} < \frac{60 - 52}{9.1} \right) = 1 - P(Z < 0.879) =\)0.1894

8.6 Numerical measures of random variables

Similar to describing statistical variables, we can use numerical measures to describe random variables. The two most commonly used numerical measures are the expected value and variance.

8.6.1 Expected value and Variance

For a discrete random variable X with the probability table:

X \[x_{1}\] \[x_{2}\] \[x_{3}\] …… \(x_{N}\)
P \[p_{1}\] \[p_{2}\] \[p_{3}\] …… \(p_{N}\)

The expected value of X denoted by \(E(X)\) is the counter part of sample average.

\[E(X) = \sum_{i = 1}^{N}x_{i}p_{i}\] The variance of X denoted by \(Var(X)\) is the counter part of sample variance.

\[Var(X) = \sum_{i = 1}^N\Big(x_i - E(X)\Big)^{2}p_i\]

For continuous random variable X with a density function \(f(x)\)

\[E(X) = \int_{-\infty}^\infty xf(x)dx = \lim_{N \to \infty } \sum_{i = -N}^{N}x_{i}f(x_i)\Delta x = \lim_{N \to \infty } \sum_{i = -N}^{N}x_{i}p_i \] \[Var(X) = \int_{-\infty}^\infty \Big(x-E(X)\Big)^2f(x)dx = \lim_{N \to \infty } \sum_{i = -N}^{N}\Big(x_{i}-E(X)\Big)^2f(x_i)\Delta x = \lim_{N \to \infty } \sum_{i = -N}^{N}\Big(x_i - E(X)\Big)^{2}p_i \] In statistics, we often prefer to use the square root of the variance as the measure of variation.

Because the unit of variance is the squared unit of the random variable, which can be difficult to comprehend, we often use the standard deviation instead of the variance.

Standard deviation of a random variable: \(SD(X) = \sigma_{X} = \sqrt{Var(X)}\).

Example 1 (Binomial random variable)

For binomial random variable \(X\sim B(n,p)\), we have

\(E(X) = \ np\)

\(Var(X) = np(1 - p)\)

\(SD(X) = \sqrt{np(1 - p)}\)

Example 2 (Normal random variables)

For a normal random variable \(X\sim N(\mu,\sigma^{2})\),

\(E(X) = \mu\)

\(Var(X) = \sigma^{2}\)

\(SD(X) = \sigma\).

Because the normal distribution depends only on two parameters \((\mu,\sigma)\)—the mean and the standard deviation—the distribution function is fully specified by these parameters.

One application of these numerical measures is the 68-95-99.7 rule. In fact, this rule holds exactly for normal random variables. Since we use the normal distribution to approximate the distribution of many statistical data sets, this is known as the empirical rule.

Another application is that we can use these numerical measures to transform a normal random variable into a standard normal random variable. As a result, we can use the standard normal probability table to calculate the probabilities for all normal random variables.

Assume \(X\sim N(\mu,\sigma^{2})\). Because \(X\) is a random variable, \(aX + b\) is also a random variable. The expected value and the variable of \(aX + b\) can be calculated as follows.

8.6.2 Rules of expected value,variance and covariace

\(E(aX + b) = \sum_{i = 1}^N\left( ax_i + b \right)p_i = \sum_{i = 1}^Nax_ip_i + \sum_{i = 1}^Nbp_i = a\sum_{i = 1}^Nx_ip_i + b\sum_{i = 1}^Np_i = aE(X) + b\)

\(Var(aX + b) =\sum_{i = 1}^N{\left( ax_i + b - E(aX+b) \right)^2}p_i = \sum_{i = 1}^N{{a^{2}(x}_{i} - E(X))^{2}}p_i = a^{2}Var(X)\)

So we have two useful formulas that also hold for continuous RVs:

\(E(aX + b) = aE(X) + b\)

\(\text{Var}(aX + b) = a^{2}Var(X)\)

Applying the formulas above to \(Z = \frac{X - \mu}{\sigma}\). We have:

\(E(Z) = E\left( \frac{X - \mu}{\sigma} \right) = \frac{1}{\sigma}E(X - \mu) = 0\)

\(\text{Var}(Z) = Var\left( \frac{X - \mu}{\sigma} \right) = \frac{1}{\sigma^{2}}\text{Var}(X) = 1\).

\(Z = \frac{X - \mu}{\sigma}\) has expected value 0 and variance 1. It is standard normal.

For two random variables we have population covariance and population correlation coefficient:

Covariance of a random variable:

\(Cov(X,Y) = \sum_{j = 1}^{M}{\sum_{i = 1}^{N}(x_{i}} - E(X))(y_{j} - E(Y))P(X = x_{i},Y = y_{j})\)

For independent random variables we have \(P(X = x_{i},Y = y_{j}) = P(X = x_{i})P(Y = y_{j})\). It follows then the covariance of independent RVs is zero.

\(Cov(X,Y) = \sum_{j = 1}^{M}{\sum_{i = 1}^{N}{(x_{i}} - E(X))(y_{j} - E(Y))P(X = x_{i})P(Y = y_{j})} = 0\)

Correlation coefficient of two random variables is defined as follows:

\(\rho_{X,Y} = \frac{Cov(X,Y)}{\sqrt{\text{Var}(X)Var(Y)}}\)

Expected value and Variance of sum of random variable

\(E(X + Y) = E(X) + E(Y)\)

In text the formula above states the expected value of a sum of random variables is the sum the expected values of each of the random variables.

\(Var(X + Y) = Var(X) + Var(Y) + 2Cov(X,Y)\)

For independent random variables \(Cov(X,Y)=0\). This implies that the variance of the sum of these variables is the sum of their variances.

\(Var(X + Y) = Var(X) + Var(Y)\)

Review questions

  1. How do we describe a population of interest?

  2. We use tables, diagrams, numerical measures to describe statistical variable. What are the three counter parts in describing a random variable?

  3. What is the difference between a random variable and a statistical variable

  4. What is a simple random sample?

  5. Why is the principle of simple random sample useful?

  6. What is distribution of sample average?

  7. What does the central limit theorem say?